\chapter{Our Research Question and its Theoretical Background} \label{theoretical-background}

%Our claim: not that overspecification helps resolve reference in short term, but more in the long-term (conceptual pacts). It’s obvious that it helps resolve reference bc you give more information.

Overspecification is a topic that has been much studied by psycholinguistics but has been largely ignored in the computational models for the generation of referring expressions. As a result of empirical studies, it has been found that adult speakers overspecify references about one-third of the time \citep{Maes_Arts_Noordman_2004,Engelhardt_Bailey_Ferreira_2006}. In Section \ref{hypotheses}, we discuss two possible explanations for the existence of overspecification in reference and the corresponding empirical and experimental evidence for both. Then, in Section \ref{existing-models}, we review the existing computational models for the generation of referring expressions. Finally, in Section \ref {problematic}, we position ourselves with regards to the existing research and talk about `bridging the gap' between the different disciplines studying reference. 


\section{Explanations for Overspecification} \label{hypotheses}

There are two main explanations regarding why people overspecify as often and as much as they do. The first explanation states that overspecified referring expressions are only included because \emph{not} overspecifying is too cognitively costly and involves filtering out information about a speaker's environment, which requires too much effort. This results in speakers including all potentially pertinent information, whether it be redundant or not. The second explanation postulates that overspecification is a useful part of communication, and that sharing more information regarding a referent enables speakers to establish a common informational base that can be exploited in future communication. We adhere to the second hypothesis, and believe that overspecification is included in reference as a tool for successful long-term communication. Below we describe these two competing explanations in more detail, along with the empirical evidence supporting both.

\subsection{Explanation 1: Overspecified REs Impair Comprehension} \label{effort-hypothesis}

In order to understand a reference and identify the object referred to, a listener has to gather an ensemble of information not only about the object itself, but also the other objects in his environment, which can be potential \emph{distractors} that impede the successful resolution of the reference. For example, in order identify a book on a bookshelf, the listener may not only have to use the color or size of the book (\emph{"the big black book"}), but also relative position (\emph{"to the left of the blue book"}) or elimination (\emph{"no, not that one"}), among other characteristics of the object. Determining which information is pertinent to refer to an object requires listers to extract information not only from the physical environment of the interaction but also the linguistic context of their conversation, and possibly their interlocutor's knowledge of both. Coordinating all these different sources of information and constantly keeping them updated may potentially be too cognitively overwhelming to occur in real-time language use, which is why, according to explanation (1), overspecification takes place in spontaneous reference production.

According to the \emph{Principle of least effort} \citep{Pechmann_1989,Engelhardt_Bailey_Ferreira_2006}, speech production is an incremental process, and it is not cognitively efficient for speakers to wait for the complex evaluation of all dimensions of an object and a situation to produce a minimal specification, since this would result in postponing the completion of speech production processes. Therefore, specifying the redundant properties in a referential expression is easier than explicitly ignoring them. Pechmann's results support the idea that speakers spontaneously generate referring expressions incrementally, and that decisions regarding the content of a referring expression are reached before they are finished observing the whole scene and computing what would be the most efficient minimal reference. Also, humans, as opposed to algorithms, do not backtrack and undo their decisions even if they are deemed to be redundant, so overspecified information stays in spontaneous REs. By planning utterances in an incremental way and linguistically encoding each piece of information as it appears, speakers produce the references quicker, albeit with overspecification, which plays no role in communication itself.

 In an experiment done by Engelhardt et al. \citep{Engelhardt_Bailey_Ferreira_2006}, subjects' eye movements were tracked in overspecification and minimal specification contexts. It was found that although subjects rated overspecified descriptions as equal to minimal ones, their eye movements indicated momentary confusion, and they took longer to interpret and resolve REs given. The conclusion made by Engelhardt et al. was was that overspecified information was ``detrimental to comprehension performance". The same group followed up with a second experiment \citep{Engelhardt_Bar_11}, this time not only measuring reaction time but also several ERPs (event-related brain potentials), which have been found to be sensitive to anomalies exhibited during language comprehension. They found that when overspecification was included, there were, on one hand, slowdowns in reaction time, and on the other, centroparietal negativity peaking at 500 ms. after the usage of the redundant adjective. The type of brain activity observed is similar to N400 ERP activity, which has been linked to semantic integration problems \citep{Coulsonetal05}. Engelhard et al. interpreted the timing and distribution of this activity to be a sign of the fact that additional adjective modifiers in a non-ambiguous context result in semantic-integration of lexical-access problems. 
	
	However, measuring the cognitive effort and mental processes involved in RE interpretation is not a simple question; not only do we not know exactly what are the elementary operations that constitute complex mental processes, we also don't know how to evaluate the cognitive cost of interpreting an utterance. Previous research exists on parameters such as the duration of a mental process \citep{sweller_98}, as the result of which several conclusions were made:  for instance, since it is impossible to tell an intense reflection from one of lesser intensity, the duration of the thinking period is not representative of the effort required. In a study done by Landragin et al \citeyearpar{LANDRAGIN-2001}, they proposed that several mental processes are involved in the cognitive effort of processing a RE, and that its comprehension and identification depend on a combination of several processing, including: establishing the context of application of the RE, the complexity of the visual scene and verbal expression, and the history of previous interactions. In view of both the theoretical and empirical data, while the true amount of effort spent on resolving an RE is hard to measure, the reduction of the cognitive cost of RE production remains a possible explanation for overspecification.

\subsection{Explanation 2: Overspecified REs Enhance Comprehension}\label{useful-hypothesis}

An alternative explanation for the phenomenon of overspecification is based on the idea that sharing information facilitates communication, so therefore sharing more information about a reference should furthermore facilitate communication, especially in the long run \citep{WuBoaz2007}. In a recent experiment by Lane et al \citeyearpar{Wardrow_Lane_2006}, it was shown that although speakers were fully aware of which information was privileged (or shared) and which was not, they still chose to use information that was not pertinent to the exchange. This over-inclusion of objectively `redundant' information can be explained in several ways: for example, via the concepts of \emph{common ground} and \emph{alignment} \citep{Brennan_Clark_1996}.

In Brennan and Clark's article, they state that unlike the existing theories of reference, which postulate reference specification anchored in the present, they see reference as a phenomenon transcending time, exploiting both past and potential interactions in order to establish a synchronized way of addressing an object. The key idea of Brennan and Clark's theory is that when a speaker and an addressee have an interaction around a particular object, they \emph{align} their references, meaning that they establish a temporary agreement about how that referent is to be conceptualized. Brennan and Clark also argue that adherence to the Gricean Maxim of Quantity can be overridden by other factors, such as the visual salience or lexical availability of a concept. For example, the concept of 'cat' is more easily accessible than the concept 'animal' and will be used to refer to a cat even if all distractors are non-animals, which is actually overspecification, since it provides more information than is strictly needed to identify the referent.  

In Brennan and Clark's vision of reference, it is a two-sided process, where both participants adapt their references not only based on precedents or the environment, but also in response to a partner's feedback. Using overspecification initially would therefore permit the alignment to be more firmly established, and subsequent references can be shortened, using only pertinent elements of the initial reference. In an experiment done by Clark and Wilkes-Gibbs \citeyearpar{Clark_Wilkes-Gibbs_1986} where pairs of subjects referred to abstract complex figures, it was found that they spontaneously started using the same terminology as their conversational partner, and, going even further, would form tacit agreements as to the vocabulary used to refer to certain objects, seeking to simplify the noun phrase as much as possible. For example, for one pair of subjects, an initial RE to an abstract figure was \emph{``[the] one [that] looks like a person who’s ice skating, except they're sticking two arms out in front"} got simplified, after six trials, to \emph{``the ice skater"}. In another pair, an RE regarding the same figure was reduced from \emph{``a person with a leg sticking out back"} to \emph{``a ballerina"}, following a series of successive alignments between the two speakers. This indicates that, with no explicit pact between the participants, they spontaneously aligned themselves to reach a more efficient way of referring to an object. For Clark and Wilkes-Gibbs, this is evidence of the role of overspecification in contributing to the achievement of this reduced RE. Having firmly established all existing properties of the object, whether they be pertinent or not, at the beginning of the exchange, speakers assure their later availability for ``efficient" references in further conversations.

	In pragmatic terms, \emph{lexical acquisition} can be seen as the most extreme case of alignment, which can span anywhere from using the same vocabulary to referring to objects using the same properties.  In the cases when a speaker is learning a new language, they are completely \emph{aligned} with their mentor. The learner uses the same vocabulary and grammatical constructions as their teacher because they have an extremely limited choice of linguistic resources that they can use \citep{Atkinson_Churchill_Nishino_Okada_2007}. In other cases of dialogue, it is less trivial to isolate alignment because it is hard to tell where a speaker's speech resources stem from and to which extent they are aligning themselves with their interlocutor or simply using their own spontaneous vocabulary. While previous studies have shown that alignment in the referential process exists \citep{Goudbeek_and_Krahmer_2012}, a study such as ours, that focuses on lexical acquisition, permits us to control the speaker's knowledge of a language and target the 'square one' from where alignment starts. 


\section{Existing Models for the Production of Referring Expressions} \label{existing-models}

Describing the production of REs has a practical side as well: the hypotheses described in section \ref{hypotheses} have been applied in creating computational and cognitive models of RE generation. On one hand, the psycholinguistic work regarding the reality of speaker's respect of Gricean Maxims of Quality of communication had a significant influence on algorithms of RE generation. On the other hand, the practical functioning of RE interpretation has also been formalized in several models of reference; here we focus on the Reference Domain theory, which models reference as a partitioning phenomenon, via a multi-utterance referential process. 

\subsection{Classical computational models of referrring expression generation}

 The specification as to the amount of information that should be included in a reference was paramount to early algorithms of RE generation, such as the Full Brevity algorithm \citep{Dale_1989}, which yielded only minimally specified distinguishing REs. However, the REs produced by this algorithm were deemed not to be fully coherent with REs spontaneously produced by human speakers, following corpus analysis by Dale and Reiter \citeyearpar{Dale_Reiter_1995} and based on results of psychological research by Pechmann \citeyearpar{Pechmann_1989}. It is these discoveries that were behind the consequent Incremental Algorithm \citep{Dale_Reiter_1995}, which does not attempt to look for an `optimal' set of attributes, but rather goes through a list of attributes that have a fixed order determined by the user. Following Pechmann's \emph{Principle of least effort}~\citeyearpar{Pechmann_1989}, the algorithm has no backtracking, so descriptions are not always minimal, but all potentially discriminative properties can be included in the RE generated, with less computational cost.

With the appearance of the Incremental Algorithm, the issue of which properties to include in the list of potential discriminative properties came to the foreground. There was extensive work done on visual salience and especially on integrating it into RE generation algorithms~\citep{Kelleher_Costello_Vangenabith_2005,Gatt_Goudbeek_Krahmer_2011}. Landragin et al.~\citeyearpar{LANDRAGIN-2001} proposed a hierarchy of saliency criteria that were linked with an algorithm that detects salient objects. They adhered to the Gestalt theory~\citep{Kohler_29}, whose Principle of Totality dictates that the most salient form is the one that requires the least sensorial information to be treated succesfully. This was incorporated into Landragin's hierarchy, which was is a classification of the properties that make an object more salient in a visual context. These properties are: category, functionality/luminosity, physical characteristics (size, shape, color, texture), and orientation. Their theoretical work on visual salience entailed further work on algorithms of RE generation.

Subsequently, various algorithms of RE generation were created, focusing on different aspects of reference: order of properties \citep{Krahmer-Van-Deemter-2011}, domains of reference \citep{salmon-alt_2000}, saliency \citep{LANDRAGIN-2001}, etc. Gatt and Krahmer \citep{gatt_krahmer_2010} focused on studying overspecification and, more specifically, alignment in REs and applying this to an algorithm of RE generation. They proposed a new model of parallel processes: a preference-based search process based on the Incremental Algorithm and an alignment-based process based on previous interactions of the user with the system. In their system, a RE is built by taking properties from the system's working memory, with overspecification occurring when several processes added properties to the buffer. This algorithm's overspecification rate is much higher than that of the Incremental Algorithm and very similar to that of human data, that is, close to one-third of REs \citep{Maes_Arts_Noordman_2004,Engelhardt_Bailey_Ferreira_2006}. This is indicative of the fact that it is possible and, indeed, useful to model human communicative behavior in many of its aspects using algorithms, and it is a promising direction for future research. This has practical applications in AI and human-machine interaction, since any machine system, in order to communicate, has to refer to objects either in the outside world or to its internal representations, as well as to mimic spontaneous speech phenomena such as alignment. In order to achieve this, the practical functioning of the reference process must also be modeled from a cognitive perspective.


\subsection{Reference Domain Theory and Partitioning} \label{rdt-partitioning}

In order to model the cognitive side of reference production, we will be focusing on the Reference Domain Theory \citep{Salmon-Alt_Romary_2009} because it is the only existing theory that models reference as a \emph{multi-utterance process}, which we believe to be empirically true after observing it in existing corpora via our own studies. This theory sees Reference Domains as mental representations of objects in the world, and is built based on the assumption that reference is a way of accessing and restructuring of these domains. It models reference as a sequential process using \emph{partitioning}, which involves narrowing down the domain of reference with each referential expression given. The role of an RE is therefore selecting a specific domain and restructuring it via partitioning until only one possible referent remains. This partitioning can be based on information from previous communication, conceptual knowledge regarding the environment, or perceptual information from the visual scene. This multi-utterance process implies that the identification is gradual- a speaker can start with a more general partition, and then narrow it down with subsequent REs. For example, one can say \emph{``pass me the book"}, followed by \emph{``the blue one"}, \emph{``the one on your left"}, etc., using a series of REs to narrow down the initial domain of reference until only the referent remains.

		Despite being the most fitting model for reference resolution applicable to our project, we still expect to make some modifications to the Reference Domain Theory. On the one hand, we consider it too generic because it does not specify a preferred order for properties used in referring expressions, which we have found to exist (see below). On the other hand, it also lacks the role the overspecification phenomenon, which we believe to be a key aspect of reference. We believe that overspecification plays an important role in partitioning because it provides a higher efficiency of identifying the referent. While giving the minimal amount of information is theoretically enough to partition the Reference Domain until a single entity is left, practically there are always several ways to carry out the partitioning, via various properties of the referent. By giving more information during an initial partitioning of the Reference Domain, the speaker ensures that the partitioning is carried out as efficiently as possible and can be exploited later on. This is similar to the alignment phenomenon described previously, since by giving all possible partitions of a Reference Domain the speaker ensures that subsequent REs can use any of the partitions mentioned. 

	If we admit that partitioning exists, we can also try to establish a certain regularity in the types of properties used to produce REs. In a taxonomy of referential properties created by the GIVE project \citep{gargett_2010}, the following properties were found to be used in referential expressions:
\begin{enumerate}
	\item \emph{Taxonomic Property}- the type of object referred to. E.g. `button'
	\item \emph{Absolute Property}- property of the object determined without comparing to other objects. E.g. `red'
	\item \emph{Relative Property}- property of the object in relation to similar objects. E.g. `left of'
	\item \emph{Viewer-Centered}- object's location relative to the viewer. E.g. `on your right'
	\item \emph{Micro- and Macro-level landmark intrinsic}- object's location in relation to other objects (either movable or immovable). E.g. `on the wall'
	\item \emph{Distractor intrinsic}- object's location in relation to another object of the same kind. E.g. `near the blue one'
	\item \emph{History of interaction}- reference to the object using elements from previous interactions. E.g `the same one as last time'
	\item \emph{Visual focus}- reference using visual context and visible elements. E.g. `that one'
	\item \emph{Deduction by elimination}- reference by specifying which objects are not meant. E.g. `not that one' 
\end{enumerate}
Gargett et al. created this list via a corpus study of human instruction giving, collected during an experiment using the GIVE virtual environment. The tasks were carried out by asking one participant in each pair to give typed instructions to the other in order to complete the task. In order to help the instruction follower manipulate the correct set of objects in the world, the instruction giver had to use sequences of REs. It was observed that speakers rarely gave more than three properties of an object at the same time, and they preferred to give several utterances when additional information was needed by the listener to find the object, in accordance with the partitioning phenomenon described previously.

	Prior to starting our own experiment, we carried out our own corpus study of the GIVE corpora\footnote{The other corpus studied, although to a lesser extent, was the SCARE corpus, \citep{stoia_08}} in order to observe the multi-utterance aspect of reference-giving, which was not previously studied. In this study, we took all sequences of REs regarding the same referent and analysed them to extract an order of properties used in spontaneous reference production. We found that the order of properties used was the following: Absolute, Taxonomical, Landmark Intrinsic, Visual/Relative, Deduction by Elimination, Viewer-Centered, Distractor Intrinsic, History of Interaction\footnote{Although this order was not the same in all cases of multiple-utterance REs, it was the general trend encountered in the corpora studied}. The existence of an order of properties, even it was not followed by all speakers at all times, is evidence that a certain regularity of partitioning exists. Similar patterns were previously found in psycholinguistic experiments \citep{Viethen_2008,Pechmann_1989,Arts_Maes_Noordman_Jansen_2011}, for instance the prevalence of absolute over relative properties.
% In our experiment, we aimed to model the role of overspecification in partitioning by making a group of subjects that received the minimal RE up front and the overspecified portion only if they hesitated- we believe that this mimics the mental phenomenon of partitioning via multi-utterance references. 


\section{The Problematic Addressed} \label{problematic}

While much research has been carried out on reference production and overspecification, the import of our project is that is brings together several different aspects of these domains. On one hand, we model RE generation with the partitioning phenomenon stipulated by the Reference Domain theory; on the other, we make more concrete the results of psycholinguistic research regarding the interpretation of overspecified REs; finally, we test alignment in lexical acquisition by doing the experiment in an L2 acquisition setting. Thanks to previous the analysis of previously existing corpora of RE generation, as well the results of RE-generation algorithms, we are able to position ourselves in a novel aspect of these topics. 

\subsection{Positioning}

 We used the results of previous overspecification studies in order to plan our own experimental setup. In their experiment, Arts et. al \citep{Arts_Maes_Noordman_Jansen_2011} found that  overspecifying REs with a color added less time than overspecifying with size, and that providing location information resulted in a shorter identification time. They suggested that this was because color was an intrinsic property of the referent, whereas relative properties were something external to the objects. As well, Pechmann \citep{Pechmann_1989} suggested that absolute attributes are preferred over relative ones; the reason for this is that in order to interpret relative properties, complex viewing patterns and multiple glances in between objects are needed because the relevant characteristics must be calculated via a multi-object reference system. The system has to characterize the target object as being different than the objects around it (or compared to an abstract benchmark, which must also be called upon from memory and takes a certain cognitive effort), or its relative position with regards to the other objects. For absolute dimensions, only one set of visual data is needed, and the viewing process is simplified. So, absolute properties are more predominantly used than relative ones to minimize cognitive effort; this carries implications for RE modeling.

	While the experimental results seem to argue against overspecification, we believe that this evidence is incomplete, at least in the form that it currently exists. In the first experiment of Engelhardt et al. \citep{Engelhardt_Bailey_Ferreira_2006}, it was concluded that subjects were confused if they were given overspecified REs. However, the REs that were given were not optimal-- for instance, an instruction such as \emph{``Put the apple on the towel in the box"} is syntactically ambiguous because it impossible to tell, out of context, where the PPs ``on the towel" and ``in the box" should be attached. It is therefore to be expected that listeners consult the visual scene to see how they should interpret the instruction and exhibit eye movements indicating confusion. However, this is not direct evidence against the utility of overspecification because the eye-tracking confusion found may be due to the syntactic ambiguity of the RE given and not as a direct consequence of overspecification. Similarly, in the 2011 ERP study done by the same group \citep{Engelhardt_Bar_11}, it was also concluded that overspecification impairs performance; however, the concepts of `performance' and `utility' were not differentiated in their interpretations of the results. While it is true that overspecification entails a longer processing time, that does not indicate that it is detrimental to performance or cognitive efficiency; it may simply be a slower, yet equally operative way of processing the incoming information following the research concerning cognitive effort and processing time (see Section \ref{effort-hypothesis}). We believe that the current experimental evidence needs to be corroborated with additional experiments regarding the contribution of overspecification toward the efficiency of long-term communication, and in particular RE resolution.



\subsection{The Rationale of this Work- Bridging the Gap}

	This project is part of a larger effort whose goal is to 'bridge the gap' between psycholinguistics and computational linguistics by studying the production of referring expressions~\footnote{"Bridging the Gap Between psycholinguistics and computational linguistics", NWO Vinci Project, 2008-2013, http://bridging.uvt.nl/} \citep{Krahmer-Van-Deemter-2011,Goudbeek_and_Krahmer_2012}. Such a bridge has both theoretical and methodological advantages. Psycholinguistics has a tendency to control a maximum of parameters to keep experiments objective, which reduces the spontaneity of situations of communication. On the other hand, computer linguistics studies situations of spontaneous communication in all of their complexity, which makes it more difficult to filter out the possible external influences. Combining the high-quality results of one with the realism of the other would entail a better understanding of the empirical and theoretical functioning of reference production and interpretation. While some collaborations between the two fields exist (for example, Dale and Reiter's inspiration from Pechmann's work, \citeyearpar{Dale_Reiter_1995}), the two areas still remain distant. In practice, a better understanding of both sides of the issue of reference production can not only permit a better understanding of speech production and comprehension in general, but also be applied in practice in improving existing technologies of speech recognition and synthesis.

	
 

